Protein Structure Prediction: Selecting Salient Features from Large Candidate Pools

نویسندگان

  • Kevin J. Cherkauer
  • Jude W. Shavlik
چکیده

We introduce a parallel approach, "DT-SELECT," for selecting features used by inductive learning algorithms to predict protein secondary structure. DT-SELECT is able to rapidly choose small, nonredundant feature sets from pools containing hundreds of thousands of potentially useful features. It does this by building a decision tree, using features from the pool, that classifies a set of training examples. The features included in the tree provide a compact description of the training data and are thus suitable for use as inputs to other inductive learning algorithms. Empirical experiments in the protein secondary-structure task, in which sets of complex features chosen by DT-SELECT are used to augment a standard artificial neural network representation, yield surprisingly little performance gain, even though features are selected from very large feature pools. We discuss some possible reasons for this result.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Structure Prediction : Selecting Salient Featuresfrom

We introduce a parallel approach, \DT-Select," for selecting features used by inductive learning algorithms to predict protein secondary structure. DT-Select is able to rapidly choose small, nonre-dundant feature sets from pools containing hundreds of thousands of potentially useful features. It does this by building a decision tree, using features from the pool, that classiies a set of trainin...

متن کامل

An On/Off Lattice Approach to Protein Structure Prediction from Contact Maps

An important unsolved problem in structural bioinformatics is that of protein structure prediction (PSP), the reconstruction of a biologically plausible three-dimensional structure for a given protein given only its amino acid sequence. The PSP problem is of enormous interest, because the function of proteins is a direct consequence of their three-dimensional structure. Approaches to solve the ...

متن کامل

Automatic feature selection from a large number of features for phone duration prediction

The present research investigates automatic feature selection for phone duration prediction for computer text-to-speech (TTS), selecting from a large set of 242 candidate features. Two methods for avoiding overfitting the training data are evaluated. Experiments with an American English voice corpus show that automatic feature selection using n-fold cross validation combined with a simple per-f...

متن کامل

Cloning and characterization of MAP2191 gene, a mammalian cell entry antigen of Mycobacterium avium subspecies paratuberculosis

The aim of this study is to identify, clone and express a Mycobacterium avium subsp. paratuberculosis specific immunogenic antigen candidate, in order to develop better reagents for diagnosis and vaccines for the protection of the host. Therefore, MAP2191 gene (a member of MAPmce5 operon) from MAP, was isolated and characterized by Bioinformatics tools and <e...

متن کامل

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings. International Conference on Intelligent Systems for Molecular Biology

دوره 1  شماره 

صفحات  -

تاریخ انتشار 1993